-
Notifications
You must be signed in to change notification settings - Fork 4.5k
ledger-tool: verify: add --record-slots and --verify-slots #34246
ledger-tool: verify: add --record-slots and --verify-slots #34246
Conversation
This is #29189 with the following changes:
|
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #34246 +/- ##
========================================
Coverage 81.7% 81.7%
========================================
Files 834 834
Lines 224299 224830 +531
========================================
+ Hits 183361 183835 +474
- Misses 40938 40995 +57 |
Thanks for picking this up! Since I did the original patch, this was added to the validator and ledger tool #32632. So it seems that this change is now kinda redundant? One can achieve the same by doing a first run with --write-bank-file, diff the bank files, then use --halt-at-slot to stop at the offending slot. Even if it's a couple extra steps, the bank file includes fields that likely make debugging the divergence faster. |
A few notes about the
I uploaded one of these files here for the sake of seeing what exactly it contains: |
Oh I see, that's a lot larger than I thought. Maybe we can add an option to opt out of including the actual account data?
Yeah for the purpose of smoke testing the runtime I usually replay very large ledgers, the more slots the better really. And data for each slot would have to be generated, preferably by changing the format to include multiple slots, dumping hundreds of files doesn't seem ideal.
But so at the moment you must still separately figure out which slot diverges, then generate a bank file for it and diff right? If the format was changed to include multiple slots in order, then from the diff itself it would be possible to tell when the divergence happened I think right? What I'm thinking is this: I've used the code in this PR many times and it helped me find many runtime bugs early, so obvs I would like to have the functionality in some form. The way I've used it was to find the offending broken slot, then add logs to strategic places to figure out diffs hashes etc... which --write-bank-file conveniently outputs to a file. So somehow merging these two features makes a lot of sense I think? |
I'm onboard with this, and this idea would be inline with another idea I had to optionally include transaction information. The transaction information would further float size so not great for a total test bench, but potentially useful in automating some of the tedious work of honing in on the bug.
Yep, understood
Yeah, I think that is reasonable as well. As you can see, there is some metadata that won't change (namely the version string but could add more down the road). But, we could add everything else into a
Got it. At a high level, I see two current cases for this feature:
From above, 1. is more me telling you how I use it; do you think 2. accurately captures how you would want to use this feature if we incorporate the "optional" details? |
I'm making changes to the runtime and testing with this feature. Hopefully by the time I'm done I'll have some bright ideas, or maybe not. We'll see |
Sounds good! I'll be out next week (Dec 4-8), but back the week after (Dec 11). Can review then, or if you still have other things you're working on (or other things that you would rather work on), I could also take this task off of your hands as well. |
Yep makes sense
Yep 2 is exactly what I'd want! :) |
Looks like we've settled on a design, it does look good. I'll implement it |
dd47f23
to
efd77f3
Compare
efd77f3
to
53a74c4
Compare
4e0b427
to
57ed4a9
Compare
Here is the re-worked version, that I've been using for debugging #34057. There is a new option Please let me know what you think. |
41f85e8
to
678940b
Compare
Hi Sean, I'll take a look today. One request for the future - could you please include new changes in separate commits as opposed to combining with the existing commit ? Even if they end up getting combined on merge, it is nice to be able to flip through the commit-by-commit diffs (ie if I want to see what has changed with your latest update only) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for looking/thinking about this! I'm glad you are also interested / see value in adding information for mid-slot (ie tx-by-tx); I've done something similar and have heard others express value in getting that. The general direction of this PR is good, maybe just some tweaks to how we specifically do several things.
That being said, I think we should take an incremental approach to keep PR's smaller
- I have a PR that extends the existing
BankHashDetails
to allow multiple slots: Allow bank hash file to contain details for multiple slots #34516 - With that, we could update the file to optionally include / exclude the accounts contents
- The current
BankHashDetails
file includes them by default, but we probably want to exclude those details if we want to generate a file for 1000's of slots
- The current
- Rebase this PR on top of the above two and enable ledger-tool to create / read those files
- In a subsequent PR, add support for storing "intermediate" (ie tx-by-tx) details or other improvements
What do you think ?
678940b
to
8013ba6
Compare
8013ba6
to
edcbb88
Compare
Makes a lot of sense. Now this PR does not include transactions but does optionally include the bank, and writes the new |
258d0eb
to
4aa9daa
Compare
4aa9daa
to
4e7f09d
Compare
// writing the json file ends up with a syscall for each number, comma, indentation etc. | ||
// use BufWriter to speed things up | ||
|
||
let writer = std::io::BufWriter::new(recorded_slots_file); | ||
|
||
serde_json::to_writer_pretty(writer, &bank_hashes).unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is duplicate code with stuff we have in bank_hash_details.rs
, would be nice to single source that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fn write_bank_hash_details_file()
write a single bank, and here we are writing a Vec<BankHashDetails>
. Also fn write_bank_hash_details_file()
is used from fn dump_then_repair_correct_slots()
as well.
I am not sure how I can consolidate the two.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm yeah, maybe not as easy as I thought. I'd say we can leave it for now
The long term vision is implementing something like BankHashDetailsStreamer
. This object could hold the serializer, and then dump out details bank-by-bank. Will definitely help out if someone runs the command with "accounts" mode on a large number of slots.
You could specify the config at the start, the object could retain the config. Then, when you call stream_bank()
or whatever we decide to name it, the streamer extracts appropriate details (just the bank hash OR all the hashes + accounts).
To the end user, this is mostly invisible tho. So, like I said, I'm happy to push the PR as-is and then potentially go refactor after the fact
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, that makes sense. For virtual machine debugging it would be super-useful to have the transactions in the file too. Let start with this and go from there
88a01d0
to
0f9eec7
Compare
ledger-tool/src/main.rs
Outdated
// .default_value() does not work with .conflicts_with() in clap 2.33 | ||
// .conflicts_with("verify_slots") | ||
// https://github.com/clap-rs/clap/issues/1605#issuecomment-722326915 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As you mentioned, would be nice if we were on recent CLAP so this bug didn't hinder us.
Instead, can we do the check ourselves instead. Ie, something like:
if arg_matches.is_present("record_slots") && arg_matches.is_present("verify_slots") {
eprintln!("Some text, maybe mirror what the CLAP text would be if there weren't that bug");
}
What do you think ? Also, if we do this, we should do it fairly early in the "verify" match case, certainly before calling load_and_process_ledger_or_exit!()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, good point. I've added it.
e707b53
to
20a8add
Compare
This adds: --record-slots <FILENAME> Write the slot hashes to this file. --record-slots-config hash-only|bank Store the bank (=accounts) json file, or not. --verify-slots <FILENAME> Verify slot hashes against this file. The first case can be used to dump a list of (slot, hash) to a json file during a replay. The second case can be used to check slot hashes against previously recorded values. This is useful for debugging consensus failures, eg: # on good commit/branch ledger-tool verify --record-slots good.json --record-slots-config=bank # on bad commit or potentially consensus breaking branch ledger-tool verify --verify-slots good.json On a hash mismatch an error will be logged with the expected hash vs the computed hash.
20a8add
to
e334f34
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, let's push this (especially with repo changeover) and we can tweak things from there
Thanks @steviez |
…bs#34246) ledger-tool: verify: add --verify-slots and --verify-slots-details This adds: --record-slots <FILENAME> Write the slot hashes to this file. --record-slots-config hash-only|accounts Store the bank (=accounts) json file, or not. --verify-slots <FILENAME> Verify slot hashes against this file. The first case can be used to dump a list of (slot, hash) to a json file during a replay. The second case can be used to check slot hashes against previously recorded values. This is useful for debugging consensus failures, eg: # on good commit/branch ledger-tool verify --record-slots good.json --record-slots-config=accounts # on bad commit or potentially consensus breaking branch ledger-tool verify --verify-slots good.json On a hash mismatch an error will be logged with the expected hash vs the computed hash.
This adds:
The first case can be used to dump a list of (slot, hash) to a json file
during a replay. The second case can be used to check slot hashes against
previously recorded values.
This is useful for debugging consensus failures, eg:
On a hash mismatch an error will be logged with the expected hash vs the computed hash.
Problem
Summary of Changes
Fixes #